Cross-genre Document Retrieval: Matching between Conversational and Formal Writings
نویسندگان
چکیده
This paper challenges a cross-genre document retrieval task, where the queries are in formal writing and the target documents are in conversational writing. In this task, a query, is a sentence extracted from either a summary or a plot of an episode in a TV show, and the target document consists of transcripts from the corresponding episode. To establish a strong baseline, we employ the current state-of-the-art search engine to perform document retrieval on the dataset collected for this work. We then introduce a structure reranking approach to improve the initial ranking by utilizing syntactic and semantic structures generated by NLP tools. Our evaluation shows an improvement of more than 4% when the structure reranking is applied, which is very promising.
منابع مشابه
Genre Classification of Web Documents
Retrieving relevant documents over the Web is an overwhelming task when search engines return thousands of Web documents. Sifting through these documents is time-consuming and sometimes leads to an unsuccessful search. One problem is that most search engines rely on matching a query to documents based solely on topical keywords. However, many users of search engines have a particular genre in m...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملA Genre-Based Investigation of Inter/Intra-Lingual Relationships between Persian and English Academic Writings: Common Underlying Proficiency Oriented
Although L2 writing has attracted salient attention and monopolized many studies in EFL contexts, there is still no full image of its complicated nature. Trying to play a supplementary role in achieving that image, this study aimed at finding whether Persian and English argumentative and descriptive academic writings were inter/intra-lingually associated and if genre played a role in p...
متن کاملMatching Verb Attributes for Cross-Document Event Coreference
Collateral texts of different genre can describe the same filmed story, e.g. audio description and plot summaries. We deal with the challenge of cross-document coreference for events by matching verb attributes. Cross document coreference is the task of deciding whether two linguistic descriptions from different sources refer to the same event. This is important for reliable information integra...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1707.04538 شماره
صفحات -
تاریخ انتشار 2017